Make a workflow diagram for developing the restoration activities data

The goal of this script is to develop a workflow diagram showing how the water treatment projects were merged with habitation restoration projects to arrive at our final “restoration activities”.

Install and use the DiagrammeR package to make the diagram

#install.packages("DiagrammeR")
#install.packages("DiagrammeRsvg")
#install.packages("rsvg")
library(DiagrammeR) #to make the graph
library(DiagrammeRsvg) #to export the graph to an svg object
library(rsvg) #to export the svg object to a png

Make a function to help with defining edge statements

To make a graph with DiagrammeR, you need to define how each node related to all the other nodes. In this case, we have two types of “nodes”, the variables or datasets we are working with (i.e., habitat restoration activities, or the final restoration activities) and the processes (i.e., specific R scripts) used to create them. Making sure the syntax of the edge statements is correct was tedious, so I make this function:

#1. make a function to help with writing edge statements
define_edge<-function (node_1,node_2){
  edge<-paste0("'",node_1,"' ->"," '",node_2,"';")
  return(edge)
}

Set up some lists of variables and processes used

I chose to store the names of the initial and derived datasets in a vector called “varnames” and the names fo the processes used in a vectors called “processes”, so I could access these later in making the diagram.

#set up a list of the names of variables
varnames<-c("Descriptions of\n All PS Treatment\n Projects", 
            "Completed\nPS \nProjects", 
            "Completed PS\n Project &\nDescriptions",
            "Categorized\n PS Treatment\n Projects",
            "Habitat \nRestoration\n Treatment\n Projects",
            "Categorized\n Habitat\n Restoration\n Projects",
            "Restoration \nActivities")
print("Here are the varnames:")
## [1] "Here are the varnames:"
varnames
## [1] "Descriptions of\n All PS Treatment\n Projects" 
## [2] "Completed\nPS \nProjects"                      
## [3] "Completed PS\n Project &\nDescriptions"        
## [4] "Categorized\n PS Treatment\n Projects"         
## [5] "Habitat \nRestoration\n Treatment\n Projects"  
## [6] "Categorized\n Habitat\n Restoration\n Projects"
## [7] "Restoration \nActivities"
#write a list of process steps
processes<-c("Rscript:\n Select \nCompleted Projects", 
             "Manual\n Categorization of\n PS Projects",
             "Manual\n Categorization of\n Habitat Projects",
             "Rscript:\n Merge Restoration\n Projects")

print("Here are the processes:")
## [1] "Here are the processes:"
processes
## [1] "Rscript:\n Select \nCompleted Projects"       
## [2] "Manual\n Categorization of\n PS Projects"     
## [3] "Manual\n Categorization of\n Habitat Projects"
## [4] "Rscript:\n Merge Restoration\n Projects"

Make the list of names for the variable names and processes for the diagram

The DiagrammeR package needs the names of the multi-line nodes in a specific format (enclosed in single quotes & separated by semi-colons ;). For example: ‘’Descriptions ofAll PS TreatmentProjects’. So, run a couple of loops to get the variable names and processes correctly formatted:

#create the list of names for the datasets (initial and derived"")
nodes<-c()
for (v in varnames) {
  node<-paste0("'",v,"';")  #enclose the name from varnames in single quotes, end with a ;
  nodes<-rbind(nodes, node)
}

print("Here are the variable node names")
## [1] "Here are the variable node names"
nodes
##      [,1]                                               
## node "'Descriptions of\n All PS Treatment\n Projects';" 
## node "'Completed\nPS \nProjects';"                      
## node "'Completed PS\n Project &\nDescriptions';"        
## node "'Categorized\n PS Treatment\n Projects';"         
## node "'Habitat \nRestoration\n Treatment\n Projects';"  
## node "'Categorized\n Habitat\n Restoration\n Projects';"
## node "'Restoration \nActivities';"
proc_names<-c()
for (p in processes){
  proc<-paste0("'",p,"';") #enclose the name from processes in single quotes, end with a ;
  proc_names<-(rbind(proc_names,proc))
  
}

print("Here are the process node names")
## [1] "Here are the process node names"
proc_names
##      [,1]                                              
## proc "'Rscript:\n Select \nCompleted Projects';"       
## proc "'Manual\n Categorization of\n PS Projects';"     
## proc "'Manual\n Categorization of\n Habitat Projects';"
## proc "'Rscript:\n Merge Restoration\n Projects';"

Define the edges

The grViz statement we are going to use to define the relationships requires a series of “edge statements” that define how the nodes connect to each other..for example, which variable is input to which process and how the output flows from there. This is where I’ll use the function defined above to make writing the edge statements easier, since they need to start with the node name in single quotes, then include the -> character, and then end with the receiving node in single quotes and ending in a semi-colon (;)

#define the edges
edge1<-define_edge(varnames[1],processes[1])
edge2<-define_edge(varnames[2],processes[1])
edge3<-define_edge(processes[1],varnames[3])
edge4<-define_edge(varnames[3],processes[2])
edge5<-define_edge(processes[2],varnames[4])
edge6<-define_edge(varnames[4],processes[4])
edge7<-define_edge(varnames[5],processes[3])
edge8<-define_edge(processes[3],varnames[6])
edge9<-define_edge(varnames[6], processes[4])
edge10<-define_edge(processes[4],varnames[7])

#now, bind them all together in one vector to use later:
edges<-rbind(edge1,edge2,edge3,edge4,edge5,edge6,edge7,edge8,edge9,edge10)

print("here are the edge specifications:")
## [1] "here are the edge specifications:"
edges
##        [,1]                                                                                                  
## edge1  "'Descriptions of\n All PS Treatment\n Projects' -> 'Rscript:\n Select \nCompleted Projects';"        
## edge2  "'Completed\nPS \nProjects' -> 'Rscript:\n Select \nCompleted Projects';"                             
## edge3  "'Rscript:\n Select \nCompleted Projects' -> 'Completed PS\n Project &\nDescriptions';"               
## edge4  "'Completed PS\n Project &\nDescriptions' -> 'Manual\n Categorization of\n PS Projects';"             
## edge5  "'Manual\n Categorization of\n PS Projects' -> 'Categorized\n PS Treatment\n Projects';"              
## edge6  "'Categorized\n PS Treatment\n Projects' -> 'Rscript:\n Merge Restoration\n Projects';"               
## edge7  "'Habitat \nRestoration\n Treatment\n Projects' -> 'Manual\n Categorization of\n Habitat Projects';"  
## edge8  "'Manual\n Categorization of\n Habitat Projects' -> 'Categorized\n Habitat\n Restoration\n Projects';"
## edge9  "'Categorized\n Habitat\n Restoration\n Projects' -> 'Rscript:\n Merge Restoration\n Projects';"      
## edge10 "'Rscript:\n Merge Restoration\n Projects' -> 'Restoration \nActivities';"

Write the complicated statement that tells DiagrammeR how to make a graph

DiagrammeR has grViz statement that creates a “Graphviz” object, but the specifications are written in something called the DOT language. All of the specifications for how the graph should look are contained in this complicated statement. Also, I wanted to pass the values from my “nodes” and “edges” vectors into the statement. Doing so, would allow me to make changes to the varnames or processes more easily and those changes would then propagate through the code.

The grViz statment is a long string, so to be able to pass the nodes and edges vectors, I wrapped the long statement into a paste0 command and created an object called config_statement, which I will later pass to the grViz statement from DiagrammeR:

config_statement<-paste0("
digraph {
                          
                          # graph attributes - rankdir = LR makes it left to right rather than vertical
                          graph [overlap = true, rankdir=LR]
                          
                          # node attributes
                          node [shape = box,style=filled, 
                          fontname = Helvetica,
                          color = cadetblue3]
                          
                          # edge attributes
                          edge [color = gray]
                          
                          ", 
                          
                          paste(nodes,collapse=''),
                          
                          
                          "# node attributes
                          node [shape = diamond, style=filled,
                          color = sandybrown,
                          fixedsize = true,
                          width = 2.3,
                          height=1.8]
                          
                          # edge statements
                          ",
                         paste(edges,collapse=''),
                          "
                         
                          
                          
                          
                          }")

Now, finally, create the graph:

This part is simple, just pass the config_statement created above to the grViz statement from DiagrammeR and create the graph:

#plot the diagram using grViz (from the DiagrammeR package)
flowchart<-grViz(config_statement #end paste0
)#end graphviz

flowchart

Export the figure

Matt magically fixed the problem and figured out how to programmatically export to an svg object and then export that to a png. We need two additional packages for this: DiagrammeRsvg: contains the “export_svg” command to convert to an svg file rsvg: contains the rsvg_png (and other commands for other formats) to convert the svg object to a png.

svg<-export_svg(flowchart) #export to svg file
## Warning in make_context(private$console): '.Random.seed' is not an integer
## vector but of type 'NULL', so ignored
## pre-main prep time: 1 ms
rsvg_png(charToRaw(svg),"restoration_activities.png")